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Q Abstract 

In this paper the relation between nonanticipative rate distortion function 
^ (RDF) and Bayesian filtering theory is further investigated using the topology 

of weak convergence of probability measures on abstract spaces. The relation 
is established via an optimization on the space of conditional distributions of 
i-h the so-called directed information subject to fidelity constraints. Existence of 

q the optimal reconstruction distribution of the nonanticipative RDF is shown, 

'— 1 while the optimal causal reproduction conditional distribution for stationary 

y—i processes is derived in closed form. The realization procedure of nonantic- 

ipative RDF is described, while an example is introduced to illustrate the 
concepts. 
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> 1. Introduction 

In the past, rate distortion (or distortion rate) functions and filtering the- 
ory have evolved independently. Specifically, classical rate distortion func- 
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tion (RDF) addresses the problem of reconstruction of a process subject to 
a fidelity criterion without much emphasis on the realization of the recon- 
struction conditional distribution via causaf] operations. On the other hand, 
filtering theory is developed by imposing real-time realizability on estima- 
tors with respect to measurement data. Specifically, least-squares filtering 
theory deals with the characterization of the conditional distribution of the 
unobserved process given the measurement data, via a stochastic differential 
equation which causally depends on the observation data pp. 
Although, both reliable communication and filtering (state estimation for 
control) are concerned with the reconstruction of processes, the main under- 
lying assumptions characterizing them are different. 

Historically, the work of R. Bucy [2J appears to be the first to consider the 
direct relation between distortion rate function and filtering, by carrying out 
the computation of a realizable distortion rate function with square criteria 
for two samples of the Ornstein-Uhlenbeck process. The work of A. K. Gor- 
bunov and M. S. Pinsker [3] on e-entropy defined via a causal constraint on 
the reproduction distribution of the RDF, although not directly related to the 
realizability question pursued by Bucy, computes the nonanticipative RDF 
for stationary Gaussian processes via power spectral densities. Recently, the 
authors in jl] investigated relations between filtering theory and RDF defined 
via mutual information using the topology of weak* convergence on appro- 
priate defined spaces. The derivations of the results in [I] require elaborate 
arguments. 

The objective of this paper is to further investigate the connection between 
nonanticipative rate distortion theory and filtering theory for general dis- 
tortion functions and random processes on abstract Polish spaces using the 
topology of weak convergence. Moreover, instead of mutual information we 
invoke directed information with an inherent causality which defines the re- 
production conditional distribution. Further, the connection is established 
via optimization of directed information [5] over the space of conditional dis- 
tributions which satisfy an average distortion constraint. In comparison to 
[I] we impose natural technical assumptions, and we obtain analogous re- 
sults under the topology of weak convergence of probability measures. Thus, 
the results are easily obtained from Prohorov's theorem without introducing 



lr The terms causal and nonanticipative are used interchangeably with the same meaning 
for conditional distributions. 



2 



new spaces as done in [3]. We also present a new example to illustrate the 

realization of the filter via nonanticipative RDF. 

The main results discussed in this paper are the following. 

(1) Existence of optimal reconstruction distribution minimizing directed in- 

formation using the topology of weak convergence of probability mea- 
sures on Polish spaces; 

(2) Closed form expression of the optimal reconstruction conditional distri- 

bution for stationary processes; 

(3) Example to demonstrate the realization of the filter. 

This work is motivated by recent applications of sensor networks in which es- 
timators are desired to have a specific accuracy, when processing information 
[HJ [7j, and control over limited rate communication channel applications [8]- 
ITU] . It is important to note that over the years several papers have appeared 
in the literature utilizing information theoretic measures for estimator and 
control applications [HJ [12] . 

First, we give a brief high level discussion on the relation between nonantic- 
ipative RDF and filtering theory, and discuss their connection. 

Consider a discrete-time process X n = {X ,Xi, . . . ,X n } e X 0>n = x™ =0 Xi, 

and its reconstruction Y n — {Y , Y\ 1 . . . , Y n } G y n = x™ =0 3^ where and 
3^ are Polish spaces. 

Bayesian Estimation Theory. In classical filtering, one is given a math- 
ematical model that generates the process X n , {Px^- 1 (dx^x 1-1 ) : i = 
0,1,..., n}, often induced via discrete-time recursive dynamics, a mathe- 
matical model that generates observed data obtained from sensors, say Z n , 
{Pz i \z i -' i -jc i {dzi\z l ~ x , x l ) : i = 0, 1, . . . , n}, and the objective is to compute 
causal estimates of some function of the process X n based on the observed 
data Z n . The classical Kalman Filter is a well-known example, where the 
estimate X^ = E[Xj|Z* _1 ], i = 0, 1, . . . , n, is the conditional mean which min- 
imizes the average least-squares estimation error. Thus, in classical filtering 
theory both models which generate the unobserved and observed processes, 
X n and Z n , respectively, are given a priori. Fig. 1 is the block diagram of 
the filtering problem. 

Nonanticipative Rate Distortion Theory and Estimation. In nonanticipative 
rate distortion theory one is given a process X n , which induces a distribution 
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Figure 1: Block Diagram of the Filtering Problem 

{Px i \x i - 1 (dxi\x l ~ 1 ) : i = 0,1, . . . ,n}, and the objective is to determine the 
causal reconstruction conditional distribution {P^iyi-i ^(dy^y* -1 , x l ) : i = 
0,1, ... ,n} which minimizes the directed information from X n to Y n sub- 
ject to distortion or fidelity constraint. The filter {Y^ : % — 0, 1, . . . , n} of 
{Xi : i — 0, 1, . . . , n} is found by realizing the optimal reconstruction distri- 
bution {Py i ix < - 1 ,x i (^Z/i|z/ l ~ 1 j a?t ) : i = 0,1, . . . ,n} via a cascade of sub-systems 
as shown in Fig. 2. Thus, in nonanticipative rate distortion theory the obser- 
vation or mapping from {X^ : i — 0, 1, . . . , n} to {Zi : z = 0, 1, . . . , n} is part 
of the realization procedure, while in filtering theory, this mapping is given 
a priori. Indeed, this is the main difference between Bayesian estimation 
theory and nonanticipative RDF for the purpose of estimation. 
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Figure 2: Block Diagram of Filtering via Nonanticipative Rate Distortion Function 

The precise problem formulation necessitates the definitions of distortion 
function or fidelity, and directed information. 

The distortion function or fidelity between x n and its reconstruction y n , is a 
measurable function defined by 

n 

d ,n ■ Xo,n x y , n -> [0,oo], d Q>n {x n ,y n ) = ^Po^x^y*). 

i=0 

The directed information between X n and Y n , for a given distribution Px« {dx n ), 
and conditional distribution P Y ™\x n {dy n \x n ), is defined by [5F1 



2 Unlcss otherwise, integrals with respect to distributions are over the spaces on which 
these are defined. 
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n 

A 



i{x n -> Y n ) = ^/(x^iy'- 1 ; 



i=Q 



= Ix™^y™(-Px i |x i - 1 ,y i - 1 , -Pyiy— 1 ,^ : i = 0, 1, . . . ,n). (2) 

The notation Ix™->y n ( - > ■) illustrates the dependence of directed information 
/(X™ — > Y n ) on the two sequences of nonanticipative or causal conditional 
distributions {P Xi \x*-\y^{-\-, ■), -Py|y»-i,x»(-|-, •) : i = 0,1, . . . ,n}. In in- 
formation theory, directed information Ix«— >y»G, •) is often used as a measure 
of information from the sequence (X 1 , Y l ~ x ) over the channel P Yi \Y i - 1 ,x i {'\', ') 
to the random variable (RV) 3^, z = 0, 1, . . . , n. Directed information is also 
used in biological applications [131 E] as a measure of causality, describing 
the cause and effect. 
In this paper, it is assumed that 

Px i \xi- 1 ,Yi-i{dXi\x' l ~ 1 ,y l ~ 1 ) = Px^X'-^dxilx 1 ' 1 ) - a.s., V % = 0, 1, . . . , n. 

(3) 

The above assumption states that the process {X : i — 0, 1, . . . , n} is con- 
ditionally independence of Y 1 ^ 1 = y l ~ l given knowledge of X' -1 = x l ~ l . It 
can be shown that, (|3]) is implied by the following conditional independence, 
P Yi \vi-\x^ {.dyi\y l ~ l , x°°) = P y .\ yi -i iX i(dyi\y 1 - 1 , x l ) - a.s., Vi = 0,1,..., n. 
The last assumption states that the reconstruction of Yi does not depend on 

future values X°^ 1 = {X i+ i, X i+2 , . . . ,Xoo}, meaning that Yi is nonanticipa- 
tive or causal with respect to the process {Yi : i = 0,1, . . . ,n}. 
Given a probability distribution Px^{dx n ) and a sequence of conditional dis- 
tributions {pY i \Y i - i ,x i '■ i = 0,1, . . . ,n}, the directed information utilized in 
the definition of nonanticipative RDF is given by 

I(X n ^Y n )=I X n_+ Y n(P X n,P Yi \y i - ltXi : t = , 1 , . . . , Tl) . (4) 

The nonanticipative RDF is defined by 
R c 0n (D)= inf_ Ix«->Y»(Px»,PYi\Y*-*-j[*' i = 0,l,...,n). (5) 

Y-^yi — l x * * ^ — 0,l,...,?i. 
M{d , n (X n ,Y n )<D} 



The definition of the nonanticipative RDF is consistent with j3] in which 
nonanticipation is defined via the Markov chain X^ +l -h- X n <H- Y n , e.g., 
PYn\x°°{dy n \x°°) = PY«\x n {dy n \x n ). Therefore, by finding the solution of 
@, then one can realize it via a channel from which one can construct an 
optimal filter causally as in Fig. [2] 

The paper is organized as follows. Section [2] discusses the formulation 
on abstract spaces. Section [3] establishes existence of optimal minimizing 
distribution, and Section [4] derives the optimal minimizing distribution for 
stationary processes. Section [5] describes the realization of nonanticipative 
RDF, while Section [6] provides an example. 

2. Abstract Formulation 

The source and reconstruction alphabets are sequences of Polish spaces 
[T5] as defined in the previous section. Probability distributions on any mea- 
surable space (Z,B(Z)) are denoted by M X {Z). For (X,B(X)),(y,B(y)) 
measurable spaces, the set of conditional distributions P Y \x{-\X = x) is de- 
noted by Q(y; X), and these are equivalent to stochastic kernels on (y, B(y)) 
given (X, B(X)). 

Given the process distributions Px^(dx n ) and {Py-^-i^x^dyily 1-1 ,x l ) : i = 

0,1, ... ,n}, the following probability distributions are defined. 

(PI): The reconstruction conditional probability distribution ~P^Y n \x n (dy n \x n ) 

e MiCy„,n): 

~p y Y ™\x"(dy n \x n ) = / P Yo \x (dyo\x ) P Yl \ YoiXo (dyi\y , x ) . . . 

J Aq J Ai 

... I P YnlY n-i^(dy n \y n ~\x n ), A , n = x?= Ai e B(X 0>n ). 

JA n 

(6) 

(P2): The joint probability distribution Px n , Y n £ M.iiyo,n x <^o,n) : 

Px n ,~Y n {Go,n) = (Px n ® ^ Y n \X")(G ^ n ), G ,n £ ^(<^0,n) x ^(34),n) 

= J ^ Y ^(G ^ x n\x n )Px^(dx n ) 

where G , niI » is the x n — section of Go,n at point x n defined by Go,n,x" = {y n £ 
yo,n '■ (x n ,y n ) G Go,n} and <g) denotes the convolution. 
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(P3): The marginal distribution Pyn G -Mi(34),n) : 



A 



PY n [Fo,n) — P(Xo,n x F(i,n), Fo,n G B(^0,n) 

= J P*Y n \X n {(Xo,n x Fo,n)x n ', X n )Px n (dx n ) 

= J ^ Y n ]X n{F^ n \x n )P X n{dx n ). 

(P4): The product distribution IT 0>n : B(X ,n) x Wo,n) ^ [0, 1] of P X n G 
7Wi(^o,n) and P Y n G jMi(y ,n): 

n ,n(G ,„) = {P X n X iVn)(G ,«), G ,„ G B(^ , n ) X S(^0,n) 

= / PY n {Go, n ,x n )Px n {dx n ). 
J Xo,n 

Directed information is defined via the Kullback-Leibler distance: 

I(X n -> F n ) = D(P X n iy „||no,n) = B(P X n g> ~P Y n lXn \\P X n X Pyn) 
= / log —rps ~ r )d(P X n g> P; 

/" fP Y n\ X n(dy n \x n )^ 

= J hg { Pyn(^) 

= Ix"^y"(Px", ?y«[j[»). (7) 

Note that ^ states that directed information is expressed as a functional of 
{P X n } i^y™|x n }- 

Define the set of all (n + l)-fold convolution distributions by 

~2>(y0,n, Xo,n) = \^Y^{dy n \x n ) G Q^n] Xo,n) ■ 

Py nlX n(dy n \x n ) = ^ =0 Py.|yi-l )X i(% 



X n\yn 



,(dy n \x n ) (g) P X u(dx r 



y % l , x 1 ) - a.s. 



Next, the definition of nonanticipative RDF is given. 

Definition 2.1. (Nonanticipative Rate Distortion Function) Suppose 
do,n = J2i=oPoA x \y^> where p 0ji : X 0)i x y 0ji ->■ [0, oo), is a sequence of 



7 



B(X ^) x B(yo,i) -measurable distortion functions, and let ~Qo,n(D) (assuming 
is non-empty) denotes the average distortion or fidelity constraint defined by 

3o,n(£>) = e 3^;^) : W n (^Y"\X") = J d 0)n (x n ,y n ) 

P* Y n\xn{dy n \x n ) <g> P X n(dx n ) < £)}, D > (8) 
The nonanticipative RDF is defined by 

R C o, n (D) = inf Ix«^ Y n(Px»,%»\xn) (9) 

pY n \X n £QO,n(D) 

Clearly, R^ n (D) is characterized by minimizing directed information or equiv- 
alently lx^^Y n {Px n i Py n \x n ) over ~Qo,n(D)- 

3. Existence of Reconstruction Conditional Distribution 

In this section, the existence of the minimizing (n+ l)-fold convolution of 
conditional distributions in ^ is established by using the topology of weak 
convergence of probability measures on Polish spaces. Before we present the 
relevant results we state some properties of average distortion set Qo, n {D) 
and directed information ^x n -+Y n (Px n ,~PY n \x n )- These properties are de- 
rived in |16j . 

Theorem 3.1. fj^/ Let {X n : iiGN} and {y n : hGN} be Polish spaces. 
Then 

(1) The set "^(3^o,n; X 0in ) is convex. 

(2) I X n^Y"(Px^, Py n \x n ) is a convex functional ofP* Y ™\x n n 

for a fixed Px™ G M-i{^o,n)- 

(3) The set Qo,n(D) is convex. 

Let BC(y 0: n) denotes the set of bounded continuous real- valued functions 
on 3^o,«- A sequence {P„ : n > 1} of probability measures is said to converge 
weakly to P G -Mi(Af) if 

lim / f(x)dP n (x) = [ f(x)dP(x), Wf G BC(X). 
n ^°° Jx Jx 

Below, we introduce the main conditions for establishing existence of nonan- 
ticipative RDF @. 



Assumption 3.2. The following conditions are assumed throughout the pa- 
per. 

(1) yo,n is a compact Polish space, X 0>n is a Polish space; 

(2) for all h(-)€zBC(yo tn ) , the function mapping 



(x n ,y n G X , n x y , n -i ^ / h(y)P Y \y n -^ xn (dy\y n \x n ) G 



is continuous jointly in the variables (x n ,y n ^ 1 ) G X 0n x 3^o,n-i/ 

(3) do tn (x n , ■) is continuous on D^o.n/ 

(4) the distortion level D is such that there exist sequence (x n , y n ) G X ^ n x 

y , n satisfying d 0:n (x n ,y n ) < D. 



Note that since J^n is assumed to be a compact Polish space, then by [15] 
probability measures on yo,n are weakly compact. Moreover, the following 
weak compactness result can be obtained, which we use to show existence of 
an optimal nonanticipative RDF, Rq u (D). 



Lemma 3.3. Suppose Assumption 3.2, (1), (2) hold. 
Then 

(1) The set ~^(34),n; X 0n ) is weakly compact. 

(2) Under the additional conditions (3), (4) the set ~($ 0n (D) is a closed 

subset of~$,(yo :ri ',Xo !n ) (hence compact). 

PROOF. (1) This follows from the fact that any F^ Y "\X"(dy n \x n ) G ^(y , n ; Xq,, 
is factorized as ^Y n \x n (dy n \x n ) = l S>'i = oPY i \Y t - 1 ,x i (dyi\y t ~ 1 , x*)-a.s., where 
A^-i^feb 4-1 ,^) e Q(yi, y 0) i-i x Xo,i), 1 <i <n, and y 0)n compact 
Polish space which implies that {P Yi \Y*- 1 ,x i {.-\y l ~ 1 , x l ) '■ y t l G yo,i-i,x % G 
Xq^} is compact, hence by Prohorov's theorem it is uniformly tight Vz. Uti- 
lizing this, by induction it can be shown that the family of convolution mea- 
sures "^(34),™; X 0>n ) is compact. 

(2) Utili zing compactness of ~^(3^o,n.; <^o,n) an d condition (3) of Assump- 
on rf ,n(a; n , ■), it can be shown that ~($ 0n (D) is a closed subset of 



tion 



3.2 



~^.{yo,n] X 0tn ), and hence by Prohorov's theorem it is compact. □ 
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The previous results utilize Prohorov's theorem that relates tightness and 
weak compactness. 

The next theorem establishes existence of the minimizing reconstruction 
kernel for We need the following theorem derived in [16]. 



Lemma 3.4. Under Assumption 3.2, (1), (2), \x n -+Y n {Px n ->P Y n \x™) is lower 
semicontinuous on Py n \x n £ Q (J^n! ^b,n) f or a fixed Px^ G A^A^n)- 



By Lemma 3.3 and Lemma [3. 4| we have the following result. 



Theorem 3.5. Suppose the conditions of Lemma 3.3 hold. Then R^ n {D) 
has a minimum. 

Proof. See Appendix. □ 

4. Optimal Reconstruction of Nonanticipative Rate Distortion Func- 
tion 

In this section the form of the optimal reconstruction conditional distri- 
bution is derived under a stationarity assumption. The method is based on 
calculus of variations on the space of measures. We introduce the following 
main assumption. 



Assumption 4.1. The (n + l)-fold convolution of conditional distribution 

P Y n\xn{dy n \x n ) = (^iLo-fVil^ -1 .^ (dyily 1 " 1 , x l ) — a.s., is the convolution of 
stationary conditional distributions. 



Assumption 4.1 holds for stationary process {pQ, Yj) : i G N} and po,i(x\ y l ) = 



p(T l x n , T l y n ), where T % x n is the shift operator on x n (and similarly for T l y n ). 



The consequence of Assumption 4.1, which holds for stationary processes 



and a single letter distortion function, is that the Gateaux differential of 



Ix™->Y n (Px n , P*Y n \x n ) is done in only one direction (since Py^y^ ^{dy^y 
are stationary). Therefore, we define the variation of 

~P^ Y n \x n in 

the direc 

tion of ~P W n \x n — ~P^Y n \x n v i & ~^Y n \x n = ~^Y n \x n + t{j^Y n \x n — ~^Y n \x n ) i e ^ 
[0, 1], since under Assumption 4.1, the functionals {Py.^-i ^^dy^y 1-1 , x% ) £ 



Q(3^; 3^o,i-i x Xa,i) '■ i — 0, 1, . . . , n) are identical. 
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Theorem 4.2. Suppose Assumption 4-1 holds and Ip„ n (pY n \x n ) = Ix™^y™ 
{Px n i ~P*Y n \x n ) is well defined for every ~P^Y n \x n G Qo,n(D) possibly taking 
values from the set [0,oo]. Then ~P^Y n \x n — >■ "§-p xn {P Y n \x^) is Gateaux differ- 
entiable at every point in Qo >n (D), and the Gateaux derivative at the point 
~F^Y n \x n ^ n the direction ~P*Y n \x n — ~P^Y n \x n ^ s 9i ven by 



,. lXn )(dy n \x n )P X n(dx r > 



where P Y n G -Mi(34),n) is the marginal measure corresponding to Py-n-ix™ ® 
P X "(dx n ) E Mi(y 0j n x Af 0) „). 

Proof. The proof is similar to the one in [T7j (although it is more involved). 
□ 

The constrained problem defined by ^ can be reformulated as an uncon- 
strained problem using Lagrange multipliers. The equivalence of constrained 
and unconstrained problems is established next. 



Lemma 4.3. Suppose Assumptions 



3.2 



4.1 



hold and consider do jn (x n ,y n ) — 



YH=o p(T l x n , T l y n ), where d , n '■ Xo,n x 3^o,n — > Ro = [0, oo] is continuous in 
the second argument. Then the constrained problem as stated in Theorem 3.5 , 
is equivalent to an unconstrained problem stated below. 

inf Ix-n->Y n (Px n i~P*Y n \x n ) 

P Y n \X n £Q0,n(D) 

= max inf h X n^ Y ™(Pxn, ^y«|x«) - s£ do n (Py^\x n )\ 

= max inf fax«-*Y»(Px», Py»\x») ~ s ( / d , n (x n , y n ) 

s -° P Y n l x-ne'Q(yo, n ;Xo,„) L W 

^ Y -\x«(dy n \x n ) <g> P X n(dx n ) -£>)}■ 
Further the infimum occurs on the boundary of the set ~C$o,n(D)- 
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Proof. The proof utilizes the Lagrange duality theorem [TS], Theorem 3.1 
and Lemma [3.31 □ 



Utilizing Lemma 4.3, then 



R c 0n (D) = sup inf h X n^ Y "(Px",^Y«\X")-s(id 0n (^Y"\X")-D)\- 

(10) 

Note that Py n \x n G ^(3^o,n; Xo,n) are probability measures on 3^ 0jn there- 
fore, one should introduce another set of Lagrange multipliers to obtain an 
unconstrained problem free of such a constraint. 

Since ~~P Y n \x™(dy n \x n ) = QiLoPyAY*- 1 .x^Z/ilz/ 1-1 x *) is a consistent probabil- 
ity measure on 3^o,n, then for each k = 0,1, ... ,n, Jy a fe ^ Y k \x k {dy k \x k ) = 1. 
This constraint is expressed via 



n „ 

i=0 •'^o.ixyo.i V 7 

n „ 

= W Aj (a;* , 1 ) fir yn |x™ (rfy™ | x n ) - l)P x «(dx r 



(11) 



where {Aj(-, ■) : z = 0, 1, . . . , n) are Lagrange multipliers. 
The above observations yield the following theorem. 

Theorem 4.4. Suppose the Assumptions of Lemma \4-S\ hold and consider 

d , n (x n ,y n ) = J2top( Tixn ' Tl y n )- Then 

(1) The infimum in (10) is attained at T^* Yn] ^ Xn G ^ 0jTl (D) given 



l^Y"\x n (dy n \x n ) — ®'i = oPYi\Y i - 1 ,x i (dyi\y l 1 } x l ) 

e s P (T^,T^) P ^ Yi _ i{d y^-r 



h=o~ 



i f yi e'P<P^W n )I»\ Yi _ 1 (dy i \ti i 
and Py^idyily 1 - 1 ) G Q(^;^o,i-i). 



IT, *<0 (12) 



5 Due to stationarity assumption Py t \yt-i ("I") = -F*(- 1 ■) and Py.iyi-i x<('K ') = ^ > *('K ') 
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(13) 



(2) The nonanticipative RDF is given by 

RU D ) = & ~ E /log ( / e^^P^idy^) 

1=0 J 

If R c Q n (D) > £/ien s < and 

J2 / ^(W^V)^^^;^)^^) = d. (14) 



i=0 



Proof. The fully unconstrained problem of (10) is obtained by introducing 



the set of Lagrange multipliers {Aj(-, •) : i = 0, l,...,n} defined in (jTTj) . 

Using the pair of Lagrange multipliers {s, X — {Aj(-, •) : i — 0, 1, . . . ,n}} 
introduce the extended pay-off functional 

^■E) X (Px n , ~P*Y n \X n ) — ^-X n ->Y n (Px n , ~P^Y n \X n ) ~ s (^d ,nC^ Y n \X n ) ~ D^j 

n „ 

+ \ H^J' l ){^Y^{dy n \x n )-l)P Xn {dx n ). 



This is a fully unconstrained problem. Utilizing Theorem the Gateaux 
derivative of I^, A (Px", •) on Q(3^o,n; ^o,n) at any point I^ Y n \x n m ^ ne direction 
Py«|xn - ^y„| X „ is given by 

5Tp X (P Y n \X n ) Py n |X™ — i 7n|x«) 



log 



- i^^| X „)(rfl/ n |x n )Px"(^ n ) 



-s y d ,n(a;",2/ n )(^yn| X „-^ yn|x „)(^|x n )P X n(^) 
+ / A i (x i ,^- 1 )(^,xn-^%| X n)(dy n |a; n )Px«(^ n ) 
E?= (-^(T^TVO+Mz™,?/- 1 )) ^y"|X"(^ n | x ") 



log e 



(pV|X« - ^n| X n)(^ n |x")P X "(^ n ), VpV«|X« G ^ (^.n ; *Q,n ) • 
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a.s. 



Since I s ^ x (Px", ~F^Y n \x n ) is convex in ~F^Y n \x n > it follows from the calculus 
of variations principle that a necessary and sufficient condition for 1^ Y „^ X „ 

to be a minimizer is 5Ip X (P y n \x™', ~^Y n \x n 

"^(J^n; ^o,ri.)- Since the Gateaux derivative must be zero for all ~Pyn\ X n £ 
X 0>n ) then 

PUdy n ) ' 

Equivalently, 

Since f y . Py^y^^dy^y 1 " 1 , x l ) = 1, then 

\ l (x\y*- 1 ) = \og [ e^-^Py^dy^f- 1 ), i = 0,l,...,n. 
Hence, 

„ e^^Py^jdy^) 

^ t=0 k ^ p{Tixn ' Tiyn)p Yi \Y^( d y^ y 1 ' 1 ) ' 

Since s < and Aj > 0, i = 0,1,..., n, then ^yn| X n G ^(^o,™; ^o,n)- 

Substituting, i^yn| X „ into T^ x (P X n, ~pY n \x n ) gives (13). 

Note that for s = then B% n (D) = and ~P* Yn]Xn (dy n \x n ) = P Yn (dy n ), P X n- 
almost all x n G Afo, n - This is trivial so we must have s < 0. From Theorem 4.3 
the solution occurs on the boundary of ^o >n (D) giving (14) for s < 0. □ 

Remark 4.5. Note that if the distortion function satisfies p(T l x n , T l y n ) = 
p(xi,T l y n ) then 

Py^Y*- 1 ^(dy^y 1 ) x% ) — Py i \Y i - 1 ,x i {d'yi\y l 1 ,Xi) — a.s. 1 i = 0, l,...,n. 

That is, the reconstruction kernel is Markov in X n . However, without further 
restrictions one cannot claim that this conditional distribution is also Markov 
with respect to {3^ : i = 0,1, . . . ,n} . 

Note that unlike [3], we have derived the main results in a more straight 
forward approach utilizing the weak convergence of probability measures. 
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5. Realization of nonanticipative Rate Distortion Function 



The realization of the nonanticipative RDF (optimal reconstruction con- 
ditional distribution) is equivalent to the sensor mapping as shown in Fig. |2j 
which produces the auxiliary random process {Zj : i e N} that will be used 
for filtering. This is equivalent to identifying a communication channel, an 
encoder and a decoder such that the reconstruction from the sequence X n 
to the sequence Y n matches the nonanticipative rate distortion minimizing 
reconstruction kernel. Fig. [3] illustrates the cascade sub-systems that real- 
ize the nonanticipative RDF, which is consistent with the discussion in the 
introduction. 

Definition 5.1. Given a source {Px i \x i - 1 ,Y i - 1 {dxi\x % ~ 1 , y l ~ x ) '■ i = 0, . . . ,n}, 
a channel {PB i \B i - 1 ,A*{dbi\b % ~ 1 ,a % ) : % — 0, . . . , n} is a realization of the opti- 
mal reconstruction distribution if there exists a pre-channel encoder {Pa^a*-- 1 ^- 1 ^ 1 



(dai\a 



i-l z.i-1 



x ) : i 



y 



i-l 



0,...,n} and a post-channel decoder {P Yi \Y i - 1 ,B i (dy i \ 



b 1 ) : i — 0, . . . , n) such that 



n \ x n 



(dy 1 

where the joint distribution is 
Px n ,A n ,B n ,Y n (dx n , da 11 , db n , dy 7 



n td* 

i=o- r y i |y i - 1 ,x 



(dyi\y 



i-l 



x — a.s. 



i-l lA— 1 



P. 



r i— 1 



B i \B i ~ 1 ,A i (dbi\b l ".a' 



Ai\A i_1 jB'- 1 ,X 



i(dai\a l ,b t ,x % ) 



i-x{dxi\x l \y l ) 



a.s. 



The filter is given by {PxAB^idx^b 
i = 0, . . . , n}. 



i-l\ 



: % 



0, ...,n} or by {P Xi \ Y i-i{dXi\y 



i-i^ 



Source 




Encoder 


A\,Ay,... 


Channel 


B ,B l 




Decoder 


) 


w 







p 



p 



A i \A'-\X' ,E~ 



Optimal 
Reconstruction 
Kernel 



p 



p 



Figure 3: Block Diagram of Realizable Nonanticipative Rate Distortion Function 

15 



Thus, if {Pb^b*- 1 ^{dbilb 1-1 , a 1 ) : i = 0, . . . , n} is a realization of the nonan- 
ticipative RDF minimizing distribution then the channel connecting the 
source, encoder, channel, decoder achieves the nonanticipative RDF, and 
the filter is obtained. Clearly, {P>i : i — 0, 1, . . . , n} is an auxiliary random 
process which is needed to obtain the filter {Pxab^ (<ixj|& 4_1 ) : i — 0, . . . , n}. 
In the next section, we provide an example for such a realization. 



6. Example 

Consider the following discrete-time partially observed linear Gauss-Markov 
system described by 

/ X t+1 = AX t + BW t , X = X,teN n . . 

\Y t = CX t + DV U t G N n 1 j 

where X t G M m is the state (unobserved) process of information source 
(plant), and Y t G M. p is the partially observed (measurement) process. As- 
sume that (C,A) is detectable and (A, BB tr )z is stabilizable, (D ^ 0). The 
state and observation noise {(W t , V t ) : t G N} are mutually independent, in- 
dependent of the Gaussian RV X , with parameters iV(x , Vo), where W t G M. k 
and Vt G M. d , are Gaussian IID processes with zero mean and identity covari- 
ances. 

Noise 

B, Y t 



Ndse 


BW 


Unobserved 


cx t 


Generator 


► 


Process 





Information 


t 


Source (Plant) 


1 



Noisy 
Channel 



Figure 4: Communication System 

The realization will be done following Fig. |4| The objective is to reconstruct 
{Y t : t G N} by {Y t : t G N} causally. The distortion is single letter defined 
by 

1 n 
n + 1 ^— ' 

i=0 

The objective is to compute 

Po,n(D) = inf Ix"^Y n (Py n i^ Y n \Y«) 

Y n I Y' n ^ W 0,n 
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and then realize the reconstruction distribution. The filter realization pro- 
cedure is similar to the one found in reconstruction of {X t : t G N} in [19] , 
although here we realize a vector source over a scalar channel. The method- 
ology however, is based on the explicit formulae of optimal reconstruction of 



Theorem AA According to Theorem |4.4[ the optimal reconstruction is given 
by 



% n \ Y M n \v n ) 



z>B\\Vi-Vi\ 



Y % \Y* 



r. L~;i— l -1 



X d Vi\V 



7 



3', 



e s\\yi-yi\\ p 



s < 0. 



(16) 



Hence, from (16) it follows that Py.iyi-i y* = ^y^y 1 - 1 y^V^V iVi)~ a - s -; 
that is the reconstruction is Markov with respect to the process {Yi : i G N}. 
Moreover, since the exponential term | \yi — yi\\ 2 in the right hand side of ( 16 ) 
is quadratic in (y iy and {Xi : i G N} is Gaussian, then {(Xi, Yi) : i G N} 
is jointly Gaussian, hence it follows that ^%|yi-i t y.( - |2/*~ 1 jZ/i) is Gaussian (for 
a fixed realization of (y l ~ l ,yi)). Hence, it has the general form 



Y t = AY t + BY 



t-i 



(17) 



B t G R pxtp , and {Z t : t G N} is an independent sequence 



where A t G R pxp , 
of Gaussian vectors. The channel in (|17|) can be realized as follows. 



The communication channel (17) can be realized via a scalar additive Gaus- 



sian noise channel with feedback defined by 

B t = A t + Z t , t G N 



(18) 



where the encoder is a mapping A t = &t(Y t , Y 1 ^ 1 ) with power P t = Tr{E{(A t ) 2 }}. 
For A t Gaussian the directed information is I(A l — > £>*) = log \l+E{(A t ) 2 }Cov(Z t )^ 1 \ 
The decoder at time t G N receives B l and computes the reconstruction 
Y t = 1> t (B\Y t - 1 ). 

Realization of the nonanticipative RDF. The realization is based on the block 
diagram of Fig. [5j The encoder <&<(•,•) consists of a pre-encoder which pro- 
duces the Gaussian innovation process {K t : t G N}, defined by 



kAy 



~E{Y t \a{y t - 1 } 



t G N 



(19) 



A 



whose covariance is defined by A t = E{K t Kt r }. The decoder consists of a 
pre-decoder {K t : t G N} which is defined by 



RAYt-ElYMiY 1 - 1 } 



t G N. 



(20) 
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Note that the fidelity criterion satisfies d 0)1l (y n , y n ) = do >n (k n , k n ) = ^-j- X^=o 
ki\\ 2 . Let {E t : t G N} be the unitary matrix that diagonalizes {A t : t G N}, 
such that 



E t A t El r = diag{\ x , . . . X t , p }, t G N 
Choose {£ t : t G N} such that 

_ I £ t if & < Ki 



(21) 



>t.i 



where : £ G N} satisfies Y^=i $t,i = 



, ten, % 



D. 



l,...,p 



Define T t = E t K t . Then {T t : t G N n } is an orthogonal process. Let {T t : t G 



N n } denote its reconstruction and define <io,n(r n , T n ) = ^™ =0 | |r» — T 
Then by [20], 



n+l 

inf I(P r 

^{o!o,„(r",f")<D} 



has a solution 



^|rn( rf 7l7 n ) = ®?=o^,f,( d T*lTi) " 



where P 



riir^ 



N(r] t> iTt,r] t> i8 ti i), rj tl i 



ri[ry 

A 



1 



0, 1, . . . ,p, and 



S+I 1°S ■ Thus, the pre-encoder can be further 
scalled by T t = E t K t , and T t is compressed by A t = AtT t and sent through 
an additive white Gaussian noise (AWGN) channel with feedback, after which 
the received signal is decompressed by T t = B t B t at the pre-decoder. By the 
knowledge of the channel output at the decoder, the mean square estimator 
X t is generated at the decoder (and encoder because X t = E{X t \a{Y t ~i}})- 
The complete design is illustrated in Fig. [5| Next we pick a specific AWGN 
channel. 

Scalar AWGN Channel. Consider a scalar channel B t = A t + Z t , t G N, 

where Z t is Gaussian zero mean, Q = Var(Z t ), and A t G R. We can design 
{(A t ,B t ): tGNjby 



'«iP 



Bt 



'anPtX 



.1 tr 



t,l, 



'0tpPt\t t p 



t G N 



(22) 



18 



Encoder 



Innovation 
Generator 



Scalar 
, Channel 
14 B, 



Y t eW 
■ 



Partially Observed 
System 



-K + 



Decoder 







► 



Figure 5: Design of Realizable Nonanticipative Rate Distortion Function 

where Yh=\ ati = 1, i = 1, ■ ■ ■ ,p. 
Note that 



H, 



B t A t 

y/aiPtX-. 



tr 



H.l 



a/ ®pPtXt,p 

a i 



'aiPt 



t aiP 



\,p 



'OtpPt 



A 



V A 



A t i 



BpXp 



Therefore, 

f t = H t E t K t + B t Z t , T t = E t K u t e N. 
By pre-multiplying f t by we can construct 

#t = Et r f t 

= E* H t EtK t + EfB t Z h t G N. 

The reconstruction of Y t is given by the sum of K t and CX t as follows. 

v t = ^(s'.y*- 1 ) 

= E\ r H t E t K t + EfBtZt + CX t , t e N. 
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Next, it will be shown that the desired distortion is achieved by the above 
realization while the filter of {Y t : t G N} is based on {Y t : t G N} given by 

First, we notice that 

E[{Y t - Y t ) tr (Y t - Y t )} = Tr[E[{Y t - Y t )(Y t - Y^}) 

Then we can compute 

£?{(y t - Y t ) tr (Y t - Y t )} = TrE[{K t - K t )(K t - K t ) tr } 

= TrE[{K t - EfV t )(K t - E? Y t ) tr } 

= TrE{(K t - E?H t E t K t - E?B t Z t ){K t - E?H t E t K t - E? B t Z t ) tr } 

= TrE[ ((/ - El r H t E t )K t - Et r B t Z t ) ((/ - E?H t E t )K t - E?B t Z t ) tr } 

= Tr{(I - El r H t E t )A t (I - E^H t E t f + E? B t QB* E t } 

= Tr[{I - E*H t Ek)E*diag{\ tl i, • • • , \ P )E t {I - E?H t E t ) tr + E?B t QB\ r E t } 

= Tr{E* ((I - H t )diag(X t ,i, • • • , A t)P )(l - H t ) tr + {B t QBf))E t } 

= Tr(diag(5 t) i, . . . ,S tjP )} = D. 



Decoder. The decoder is Y t = K t + CX t , where X t : t G N is obtained from 
the modified Kalman filter as follows. Recall that 

Y t = K t + CX t 

= El r H t Et(Yt-CXt) + El r B t Zt + CXt 

= El r HtE t (CX t + DVt-CX t )+El r B t Zt + CXt 

= El r H t E t CX t - E l ;H t E t CX t + CX t + (Ef H t E t DV t + El r B t Z t ) 

where {V t : t G N} and {Z t : t G N} are independent Gaussian vectors. 
Then X t = i?{X t |o"{y* -1 }} is given by the modified Kalman filter 

X t+1 = AX t + CX t + AT lt (E t t r H t E t C) tr M~ 1 Y t , X = x 
S t+1 = A^ t A tr - AHt{ETHtEtC) tr Mt\EfH t E t C)i:tA 

+ BBf, S = S (26) 



20 



where 

M t = El r H t E t CE t (EfH t E t C) tr + Ef H t E t DD tr ( Ef H t E t ) tr + EfB^BfEf 

Infinite Horizon. As t — > oo, under the assumption that the linear Gauss- 
Markov system is stabilizable and detectable, we have the steady state version 



of (26) 



Soo = A^A tr - A^(E^H^C) tr M- 1 (E^H 00 E 00 C)i: oo A + BBZ 
where 

Moo = E t ^ H 0Q E 0Q C'L 0Q (E t ^ a H 0Q E 0Q C) tr + E t ^ 3 H OQ E 0Q DD tr {E t ^ H 00 E QO ) tr 

+ eZb^b^e? 

and Eoo is the unitary matrix that diagonalizes by 
EooA^E^ = diag(X 00t i, . . . , A tiP ) 

and 



A ) C x if s x — A \ ./ 



Coo,j — ^ ^ -r t ^ \ ) 1 — • • • )P 



satisfying ^ = D. 
Define 



as follows. 



where 7700,? = 1 — t 2221 . The realizable (nonanticipative) RDF can be computed 



R C {D) = lim inf ^ J_l x „_> y „ (p yt , ^- ) 



t-Voo \ 2 t + 1 4^ ° g 



lim 



1 p 



2^ V<5oo,. 

4 = 1 
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The power constraint satisfies Tr{E{(A t ) 2 }} = P t , Hindoo P t = P. Since 
At = AtE t K t the capacity of the channel including the encoder but not the 
decoder is 

C = lim -^—I(A l -)• B l ) 

t-s>oo t + 1 

= Iloglim-^-|l + J E;{(A) 2 }^ 1 | 
= iloglim-^-|l + E{(A) 2 }^ 1 | 

* l^oo I 

Thus, for a given distortion level D, C = R C (D) is the minimum capacity 
under which there exists a realizable filter for the data reconstruction of 
{Y t : t G N} by {Y t : t G N} ensuring an average distortion equal to D. The 
filter of {Xi : i G N} or {Yi : iGN} is obtained for {Y { : ie N} given by 
f2~3| or the auxiliary data B { = Ai(Y h Y 1 ' 1 ) + Z h i G N. 



7. Conclusion 

In this paper, the solution of the nonanticipative RDF is obtained on ab- 
stract spaces using the topology of weak convergence of probability measures 
and directed information. A specific example that realizes the optimal causal 
filter is presented. 



Appendix 

Proof of Theorem 3.5 The assumptions are sufficient to show lower semi 
continuity of the functional Ix»->Y n {Px n , ~P*Y»\x n ) with respect to P 



for a fixed Px n |16j . Moreover, by Lemma 3.3-(2), since Qo iTl (D) is a closed 
subset of a weakly compact set "^(3^o,n; ^o,n), then ^ 0>n (D) is also weakly 
compact. Existence follows from Weierstrass' theorem (e.g., a continuous 
function from a compact space to a subset of the real numbers attains its 
maximum and minimum). □ 
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